Given the predicted softmax logits $p_i$, ground-truth softmax logits or free-form weights $w_i$.
weighted softmax loss: $-\sum_{i} w_i \log p_i$
EMD softmax loss: $-\sum_{i} w_i p_i$
softmax loss after label flip layer: $-\log{\sum_{i} w_i p_i}$
knowledge distillation: $\sum_{i} (p_i-w_i)$